Optimal and Information Theoretic Syntactic Pattern Recognition for Traditional Errors
نویسندگان
چکیده
In this paper we present a foundational basis for optimal and information theoretic syntactic pattern recognition. We do this by developing a rigorous model, M*, for channels which permit arbitrarily distributed substitution, deletion and insertion syntactic errors. More explicitly, if A is any finite alphabet and A* the set of words over A, we specify a stochastically consistent scheme by which a string U A* can be transformed into any Y A* by means of arbitrarily distributed substitution, deletion and insertion operations. The scheme is shown to be Functionally Complete and stochastically consistent. Apart from the synthesis aspects, we also deal with the analysis of such a model and derive a technique by which Pr[Y|U], the probability of receiving Y given that U was transmitted, can be computed in cubic time using dynamic programming. Experimental results which involve dictionaries with strings of lengths between 7 and 14 with an overall average noise of 39.75 % demonstrate the superiority of our system over existing methods. The model also has applications in speech and uni-dimensional signal processing.
منابع مشابه
A formal theory for optimal and information theoretic syntactic pattern recognition
In this paper we present a foundational basis for optimal and information theoretic syntactic pattern recognition. We do this by developing a rigorous model, M*, for channels which permit arbitrarily distributed substitution, deletion and insertion syntactic errors. More explicitly, if A is any finite alphabet and A* the set of words over A, we specify a stochastically consistent scheme by whic...
متن کاملPattern Recognition of Strings Containing Traditional and Generalized Transposition Errors1
We study the problem of recognizing a string Y which is the noisy version of some unknown string X* chosen from a finite dictionary, H. The traditional case which has been extensively studied in the literature is the one in which Y contains substitution, insertion and deletion (SID) errors. Although some work has been done to extend the traditional set of edit operations to include the straight...
متن کاملPattern Recognition of Strings with Substitutions, Insertions, Deletions and Generalized Transpositions1
We study the problem of recognizing a string Y which is the noisy version of some unknown string X* chosen from a finite dictionary, H. The traditional case which has been extensively studied in the literature is the one in which Y contains substitution, insertion and deletion (SID) errors. Although some work has been done to extend the traditional set of edit operations to include the straight...
متن کاملPeptide classification using optimal and information theoretic syntactic modeling
We consider the problem of classifying peptides using the information residing in their syntactic representations. This problem, which has been studied for more than a decade, has typically been investigated using distance-based metrics that involve the edit operations required in the peptide comparisons. In this paper, we shall demonstrate that the Optimal and Information Theoretic (OIT) model...
متن کاملOn Utilizing Optimal and Information Theoretic Syntactic Modeling for Peptide Classification
Syntactic methods in pattern recognition have been used extensively in bioinformatics, and in particular, in the analysis of gene and protein expressions, and in the recognition and classification of biosequences. These methods are almost universally distance-based. This paper concerns the use of an Optimal and Information Theoretic (OIT) probabilistic model [11] to achieve peptide classificati...
متن کامل